Efficient Algorithms for Regular Expression Constrained Sequence Alignment

نویسندگان

  • Yun Sheng Chung
  • Chin Lung Lu
  • Chuan Yi Tang
چکیده

Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which, if implemented naïvely, can take time and space up to O(|Σ |2|V |4n2) and O(|Σ |2|V |4n), respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O(|V |3n2) time and O(|V |2n) space in the worst case. If |V | = O(logn) we propose another algorithm with time complexity O(|V |2 log |V |n2). The time complexity of our algorithms is independent of Σ , which is desirable in protein applications where the formulation of this problem originates; a factor of |Σ |2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice. © 2007 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SA-REPC - Sequence Alignment with Regular Expression Path Constraint

In this paper, we define a novel variation on the constrained sequence alignment problem, the Sequence Alignment with Regular Expression Path Constraint problem, in which the constraint is given in the form of a regular expression. Our definition extends and generalizes the existing definitions of alignment-path constrained sequence alignments to the expressive power of regular expressions. We ...

متن کامل

Regular Language Constrained Sequence Alignment Revisited

Imposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, the Regular Expression Constrained Sequence Alignment Problem was introduced, which proposed an O(n²t⁴) time and O(n²t²) space algorithm for solving it, where n is the length of the input strings...

متن کامل

Regular Expression Constrained Sequence Alignment

Given strings S1, S2, and a regular expression R, we introduce regular expression constrained sequence alignment as the problem of finding the maximum alignment score between S1 and S2 over all alignments such that in these alignments there exists a segment where some substring s1 of S1 is aligned with some substring s2 of S2, and both s1 and s2 match R, i.e. s1, s2 ∈ L(R) where L(R) is the reg...

متن کامل

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

Detecting conserved secondary structures in RNA molecules using constrained structural alignment

Constrained sequence alignment has been studied extensively in the past. Different forms of constraints have been investigated, where a constraint can be a subsequence, a regular expression, or a probability matrix of symbols and positions. However, constrained structural alignment has been investigated to a much lesser extent. In this paper, we present an efficient method for constrained struc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Process. Lett.

دوره 103  شماره 

صفحات  -

تاریخ انتشار 2006